Supervision Reduction by Encoding Extra Information about Models, Features and Labels
نویسندگان
چکیده
Learning with limited supervision presents a major challenge to machine learning systems in practice. Fortunately, various types of extra information exist in real-world problems, characterizing the properties of the model space, the feature space and the label space, respectively. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra information in learning. Extra information about the model space can be encoded as compression operations and used to regularize models in terms of compressibility. This leads to learning compressible models. Examples of model compressibility include local smoothness, compacted energy in frequency domains, and parameter correlation. When multiple related tasks are learned together, such a compact representation can be automatically inferred as a matrix-variate normal distribution with sparse inverse covariances on the parameter matrix, which simultaneously captures both task relations and feature structures. Extra information about the feature space can usually be conveyed by certain feature reduction. We propose the projection penalty to encode any feature reduction without the risk of discarding useful information: a reduction of the feature space can be viewed as a restriction of the model search to certain model subspace, and instead of directly imposing such a restriction, we can search in the full model space but penalize the projection distance to the model subspace. In multi-view learning, the projection penalty framework provides an opportunity to simultaneously address both overfitting and underfitting. Extra information about the label space can be extracted and exploited to improve multi-label predictions. To achieve this goal, we present error-correcting output codes (ECOCs) for multi-label classification: label dependency is represented by the most predictable directions in the label space and extracted by canonical correlation analysis (CCA) and its variants; the output code is designed to include these most predictable directions in the label space to correct prediction errors. Decoding of such codes can be efficiently performed by mean-field approximation and significantly improves the accuracy of multi-label predictions. Effective collection of supervision signals is an indispensable part of supervision reduction. In this thesis, we consider active learning for multiple prediction tasks when their outputs are coupled by constraints. A cross-task value of information criteria is designed, which encodes output constraints to measure not only the uncertain of the prediction for each task but also the inconsistency of predictions across tasks. A specific example of this criteria leads to the cross entropy between the predictive distributions of coupled tasks, which generalizes the notion of entropy used in single-task uncertainty sampling.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملLearning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation
The deficiency of segmentation labels is one of the main obstacles to semantic segmentation in the wild. To alleviate this issue, we present a novel framework that generates segmentation labels of images given their image-level class labels. In this weakly supervised setting, trained models have been known to segment local discriminative parts rather than the entire object area. Our solution is...
متن کاملLearning with Limited Supervision by Input and Output Coding
In many real-world applications of supervised learning, only a limited number of labeled examples are available because the cost of obtaining high-quality examples is high or the prediction task is very specific. Even with a relatively large number of labeled examples, the learning problem may still suffer from limited supervision as the dimensionality of the input space or the complexity of th...
متن کاملA Brief Introduction to Weakly Supervised Learning
Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success, it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of data lab...
متن کاملReceptive Field Encoding Model for Dynamic Natural Vision
Introduction: Encoding models are used to predict human brain activity in response to sensory stimuli. The purpose of these models is to explain how sensory information represent in the brain. Convolutional neural networks trained by images are capable of encoding magnetic resonance imaging data of humans viewing natural images. Considering the hemodynamic response function, these networks are ...
متن کامل